Challenges and experiences in collecting a chat corpus

نویسندگان

  • Wilbert Spooren
  • Tessa van Charldorp
چکیده

Present day access to a wealth of electronically available linguistic data creates enormous opportunities for cutting edge research questions and analyses. Computer-mediated communication (CMC) data are specifically interesting, for example because the multimodal character of new media puts our ideas about discourse issues like coherence to the test. At the same time CMC data are ephemeral, because of rapid changing technology. That is why we urgently need to collect CMC discourse data before the technology becomes obsolete. This paper describes a number of challenges we encountered when collecting a chat corpus with data from secondary school children in Amsterdam. These challenges are various in nature: logistic, ethical and technological.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Collecting Natural SMS and Chat Conversations in Multiple Languages: The BOLT Phase 2 Corpus

The DARPA BOLT Program develops systems capable of allowing English speakers to retrieve and understand information from informal foreign language sources. Phase 2 of the program required large volumes of naturally occurring informal text (SMS) and chat messages from individual users in multiple languages to support evaluation of machine translation systems. We describe the design and implement...

متن کامل

International Student Mobility Program (ISMP) Analysis of International Students' Challenges in Iran

In the recent decades, Internalization has been one of the most influential events in the higher education systems worldwide. And recruitment of international students, as one of the most effective strategies to achieve the goals of internationalization, has been a common solution to prevail in the competition between countries in the last decade. Therefore, studying these international stude...

متن کامل

Syntactic parsing of chat language in contact center conversation corpus

Chat language is often referred to as Computer-mediated communication (CMC). Most of the previous studies on chat language has been dedicated to collecting ”chat room” data as it is the kind of data which is the most accessible on the WEB. This kind of data falls under the informal register whereas we are interested in this paper in understanding the mechanisms of a more formal kind of CMC: dia...

متن کامل

Healthcare Priority-Setting: Chat-Ting Is Not Enough; Comment on “Swiss-CHAT: Citizens Discuss Priorities for Swiss Health Insurance Coverage”

CHAT has its limits. It is a three-hour exercise. However, the real world problems of healthcare rationing and priority-setting are too complex for a three-hour exercise. What is needed, as a supplement, are sustained processes of rational democratic deliberation that can address the challenges to healthcare justice posed by costly emerging medical technologies, such as these targeted cancer th...

متن کامل

MPC: A Multi-Party Chat Corpus for Modeling Social Phenomena in Discourse

In this paper, we describe our experience with collecting and creating an annotated corpus of multi-party online conversations in a chat-room environment. This effort is part of a larger project to develop computational models of social phenomena such as agenda control, influence, and leadership in on-line interactions. Such models will help capturing the dialogue dynamics that are essential fo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JLCL

دوره 29  شماره 

صفحات  -

تاریخ انتشار 2014